How to Intelligently Distribute Training Data to Multiple Compute Nodes: Distributed Machine Learning via Submodular Partitioning
نویسندگان
چکیده
In this paper we investigate the problem of training data partitioning for parallel learning of statistical models. Motivated by [10], we utilize submodular functions to model the utility of data subsets for training machine learning classifiers and formulate this problem mathematically as submodular partitioning. We introduce a simple and scalable greedy algorithm that near-optimally solves the submodular partitioning problem. We empirically demonstrate the efficacy of the proposed algorithm to obtain data partitioning for distributed optimization of convex and deep neural network objectives. Empirical evidences suggest that the intelligent data partitioning produced by the proposed framework leads to faster convergence in the case of distributed convex optimization, and better resulting models in the case of parallel neural network training.
منابع مشابه
Mixed Robust/Average Submodular Partitioning: Fast Algorithms, Guarantees, and Applications
We study two mixed robust/average-case submodular partitioning problems that we collectively call Submodular Partitioning. These problems generalize both purely robust instances of the problem (namely max-min submodular fair allocation (SFA) Golovin (2005) and min-max submodular load balancing (SLB) Svitkina and Fleischer (2008)) and also generalize average-case instances (that is the submodula...
متن کاملGraph Partitioning via Parallel Submodular Approximation to Accelerate Distributed Machine Learning
Distributed computing excels at processing large scale data, but the communication cost for synchronizing the shared parameters may slow down the overall performance. Fortunately, the interactions between parameter and data in many problems are sparse, which admits efficient partition in order to reduce the communication overhead. In this paper, we formulate data placement as a graph partitioni...
متن کاملSome Submodular Data-Poisoning Attacks on Machine Learners
The security community has long recognized the threats of data-poisoning attacks (a.k.a. causative attacks) on machine learning systems [1–6, 9, 10, 12, 16], where an attacker modifies the training data, so that the learning algorithm arrives at a “wrong” model that is useful to the attacker. To quantify the capacity and limits of such attacks, we need to know first how the attacker may modify ...
متن کاملENERGY AWARE DISTRIBUTED PARTITIONING DETECTION AND CONNECTIVITY RESTORATION ALGORITHM IN WIRELESS SENSOR NETWORKS
Mobile sensor networks rely heavily on inter-sensor connectivity for collection of data. Nodes in these networks monitor different regions of an area of interest and collectively present a global overview of some monitored activities or phenomena. A failure of a sensor leads to loss of connectivity and may cause partitioning of the network into disjoint segments. A number of approaches have be...
متن کاملPoseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines
Deep learning models, which learn high-level feature representations from raw data, have become popular for machine learning and artificial intelligence tasks that involve images, audio, and other forms of complex data. A number of software “frameworks” have been developed to expedite the process of designing and training deep neural networks, such as Caffe [11], Torch [4], and Theano [1]. Curr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015